GLMM Maternal Mortality Federally- Summer 2025

This is a Report Template

Author

Carolyn Herrera & Catherine Funte (Advisor: Dr. Cohen)

Published

July 31, 2025

Slides: slides.html ( Go to slides.qmd to edit)

Important

Remember: Your goal is to make your audience understand and care about your findings. By crafting a compelling story, you can effectively communicate the value of your data science project.

Carefully read this template since it has instructions and tips to writing!

Introduction

Generalized Linear Mixed Models (GLMMs) are a flexible class of statistical models that combine the features of two powerful tools: Generalized Linear Models (GLMs) and Mixed-Effects Models(Agresti 2015). Like GLMs, GLMMs can model non-normal outcome variables, such as binary, count, or proportion data. However, they go a step further by incorporating random effects, which account for variation due to grouping or clustering in the data, correlated observations, and overdispersion. The GLMM method can work with complex data structures such as temporal correlation, spatial correlation , nested data, heterogeneity of variance, and repeated measurement(Zuur et al. 2009).

In practical terms, GLMMs are especially useful when data points are not independent, such as when students are nested within schools, patients are treated within hospitals, or repeated measures are taken from the same subject over time. For example, Thall wrote that issues with longitudinal clinical trial basic count data from repeated measures taken from the same subject over time will have problems detecting comparable between subject outcomes because it can be difficult to determine if outcomes are time dependent or due to treatment groups, thus a general linear mixed model method may be utilized to represent dependence upon each patient, incorporate covariate data, create time as a function, account for variability between patients,and be flexible and tractable (Thall 1988). The random effects help model the correlation within clusters and allow for unobserved heterogeneity—differences that are not captured by the measured covariates.

GLMMs are good for:

  • Handling hierarchical or grouped data (e.g., students within classrooms, patients within clinics)(Lee and Nelder 1996)

  • Modeling non-normal outcomes, such as:

  • Improving inference by accounting for both fixed effects (predictors of interest) and random effects (random variation across groups)

  • Reducing bias and inflated Type I error rates that can result from ignoring data structure(Thompson et al. 2022)

GLMMs are ideal when your data is both complex in structure and involves non-Gaussian response variables, making them indispensable in fields like medicine, ecology, education, and social sciences. Tawiah et al describes zero-inflated Poisson GLMMs, an extension of Poisson GLMM that allows for overdispersion due to a prevalence of zeros in the data, common in health sector data(Tawiah, Iddi, and Lotsi 2020). The paper compares a Poisson GLM, a zero-inflated Poisson GLM, a Poisson GLMM, and a zero-inflated Poisson GLMM, applied to clustered maternal mortality data. Another paper by Owili et al utilizes a GLMM to investigate the impact of particulate matter on maternal and infant mortality globally (Owili et al. 2020). They use a Poisson link function and take year and country as random effects to account for differences in global data.

We wish to analyze the data of federal maternal mortality deaths via VSRR Provisional Maternal Death Counts and Rates dataset by utilizing a General linear mixed model with Poisson link as it is count data. We wish to see if ethnicity (a fixed effect) has any influence upon maternal death per live births(number of live births per 12 month period or year) count by year(random effect). Like other public health or clinical data there will be issues such as correlated observations and overdispersion but GLMM will be utilized to parse through the noise and determine if indeed there are some patterns of maternal mortality among mothers of differing ethnic ties.

Methods

Math Background

GLMMs can be considered an extension of GLMs, wherein a GLM includes the addition of random effects, or an extension of Linear Mixed Models (LMMs), where a linear model with fixed and random effects is extended for non-normal distributions(Salinas Ruı́z et al. 2023). Let

  • \(\mathbf{y}\) be a \(Nx1\) column vector outcome variable

  • \(\mathbf{X}\) be a \(Nxp\) matrix for the \(p\) predictor variables

  • \(\boldsymbol{\beta}\) be a \(px1\) column vector of the fixed effects coefficients

  • \(\mathbf{Z}\) is a \(Nxq\) matrix of the \(q\) random effects

  • \(\mathbf{u}\) is a \(qx1\) vector of random effects, and

  • \(\boldsymbol{\epsilon}\) is a \(Nx1\) column vector of the residuals

Then the general equation for the model is given by:

\[\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}{u}+\boldsymbol{\epsilon}\] (Salinas Ruı́z et al. 2023). The GLMM Model process is that the analyis of variance model or the equation is a vector of linear predictors with of unknown parameters estimates. Each distribution has is its own probability function which we will utilize the Negative Binomial as GLMMs typically include a link function that relates the response variable \(\mathbf{y}\) to a linear predictor, \(\eta\), which excludes the residuals. So then \[\boldsymbol{\eta}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\lambda}\] The link function is \(g(\cdot)\), where \[g(E(\mathbf{y}))=\boldsymbol{\eta}\] where \(E(\mathbf{y})\) is the expectation of . The choice of link function depends on the outcome distribution. For this paper our data demonstrates a Negative Binomial distribution for overdispered count data, so we will use a log link function.
\[g(\cdot)=log_e(\cdot)\] In the GLMM model the parameter estimates is solved by reducing the negative log likelihood functions(Salinas Ruı́z et al. 2023). The means or the least square means are derivative of the parameter estimates and are found on the model scale. The link function, negative binomial log link, will convert the mean estimates at the model scale to the data scale. Negative Binomial Distribution: \[ f(y;k,{\mu})=\frac{\Gamma(y+k)}{\Gamma(k)*(y+1)}\left(\frac{k}{\mu+k}\right)^{k}\left(1-\frac{k}{\mu+k}\right)^{y} \] Zuur writes that the negative binomial Distribution has two parameters \({\mu}\) and \(k\) (Zuur et al. 2009). The symbol \({\Gamma}\) is defined as \({\Gamma(y+1)=(y+1)!}\) The Mean of Negative Binomial is given: \(E(Y)= {\mu}\) The Variance of Negative binomial is given; \(Var(Y)= {\mu}+ \left(\frac{\mu^2}{k}\right)\), where second term determines the overdispersion, \(k\) is called the dispersion parameter and indirectly determines overdispersion. If \(k\) is significantly large relative to \({\mu^2}\) then the second term will approximate to zero and a Poisson distribution may as well be used. However, the smaller the \(k\) value the larger the overdispersion may form and then negative binomial is the correct log link to utilize.

GLMM model process is

Assumptions

Before deciding to use a GLMM for our data, we had to check some assumptions (specific to our negative binomial distributed data).

  • The response variable and the predictors have a linear relationship within the levels of random effects.
  • The response variable is assumed to follow a negative binomial distribution, with \(\sigma^2>\mu\).
  • The residuals and random effects are independent.
  • The random effects are assumed to be normally distributed, with mean 0 and variance \(\sigma\).

Model Choice

The Negative Binomial distribution’s quadratic mean-variance relationship is ideal for overdispersed count data that the Poisson distribution cannot accommodate. Overdispersion is a given for populations due to their heterogeneity that aggregation processes such as community clusters form with shared traits such as ethnicity. Our data includes community cluster groups such as ethnicity and age groups. The negative binomial model is desirable also due to its easily interpretable dispersion parameter as a measure of aggregation and its tractability due to closed form expression of probability mass function which helps in convenient model inference and estimation. Since our data is longitudinal, a Negative Binomial GLMM is ideal since measurements are taken over time. We incorporate Year as a random effect because it could explain for variation in our model which could not be explained by our fixed effects such as ethnicity or age group.

We perform the analysis with R(R Core Team 2025) and the packages lme4{Bates et al. (2015)} with function glmer to investigate poisson model or glmer.nb() for negative binomial model and the package glmmTMB{McGillycuddy et al. (2025)} with the function, nbinom2(), for negative binomial model. The approximate parameter estimation method uses for glmmTMB uses AGQ or Laplace approximation with Wald Z significance test. The default approximate parameter estimation method for lme4 is Laplace approximation when no quadrature point is used but is pseudo-likelihood method in linearization,and integral approximation and another estimation method is Gauss-Hermite quadrature to approximate the log-likelihood using numerical integration(Salinas Ruı́z et al. 2023).

Densities are seen as counts per volume and can be modeled with NB and offset variable(Zuur et al. 2009). We are investigating maternal deaths per live births and while we can use the variable Maternal Mortality rate, it would be less noisy if we used the offset variable log(Live Births). We can also model without the offset and utilize Akaike information Criterion,AIC, to see which is the better model as the smaller the AIC would indicate a better model. A random effect could be month,year, or month/year. Fixed effects could be age groups, ethnicity, and perhaps year. Our choice of random or fixed effects are dependent on the question we which to investigate.

Analysis and Results

Data Exploration and Visualization

The data we used comes from the National Vital Statistics System, an organization that provides national data about births, deaths, marriages, divorces, and fetal deaths. It is a collaboration between the National Center for Health Statistics (NCHS) (a part of the CDC) and state vital records offices. {create citation https://www.cdc.gov/nchs/nvss/about_nvss.htm} This dataset is the Vital Statistics Rapid Release (VSRR) Provisional Maternal Death Counts and Rates, in the form of a .csv file. The dataset contains monthly death counts and death rates from 2019 to 2024 by race and ethnicity, age and overall. Rates are maternal deaths per 100,000 live births. For this dataset, a maternal death is defined as “the death of a woman while pregnant or within 42 days of termination of pregnancy irrespective of the duration and the site of the pregnancy, from any cause related to or aggravated by the pregnancy or its management, but not from accidental or incidental causes.” {create citation https://www.cdc.gov/nchs/nvss/vsrr/provisional-maternal-deaths-rates.htm}

  • Present initial findings and insights through visualizations.

  • Highlight unexpected patterns or anomalies.

Code
# loading packages 
library(tidyverse)
library(knitr)
library(ggthemes)
library(ggrepel)
library(dslabs)
library(ggplot2)
library(patchwork) # For combining plots
library(ggpubr)
library(DataExplorer) #For EDA
options(repos = c(CRAN = "https://cloud.r-project.org"))#specify cran mirror
if (!require("lme4")) install.packages("lme4")#if required to install
library(lme4)
##to check if model is overdispersed'
if (!require("performance")) install.packages("performance")#if required to install
library(performance)
##Overdispersion is detected with Poisson model thus we must use negative binomial model
if (!require("glmmTMB")) install.packages("glmmTMB")#if required to install
library(glmmTMB)
Code
df <- read.csv("data/VSRR_Provisional_Maternal_Death_Counts_and_Rates_20250726.csv")#loading data
#variables- Maternal.Deaths--outcome variable
#Live.Births---offset variable since it is maternal deaths per live births
#Subgroup- ethnicity and year of mother must be edited --- these are fixed
#Year.of.Death---random effect variable
#Summary
#renaming columns
df<-df%>% 
  rename(Year= Year.of.Death, Month= Month.of.Death, Maternal_Deaths= Maternal.Deaths, Live_Births= Live.Births, Maternal_Mortality_Rate= Maternal.Mortality.Rate, Time_Period= Time.Period, Month_Ending_Date= Month.Ending.Date, Data_As_Of= Data.As.Of) 
str(df) 
'data.frame':   720 obs. of  12 variables:
 $ Data_As_Of             : chr  "04/06/2025" "04/06/2025" "04/06/2025" "04/06/2025" ...
 $ Jurisdiction           : chr  "United States" "United States" "United States" "United States" ...
 $ Group                  : chr  "Total" "Total" "Total" "Total" ...
 $ Subgroup               : chr  "Total" "Total" "Total" "Total" ...
 $ Year                   : int  2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
 $ Month                  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Time_Period            : chr  "12 month-ending" "12 month-ending" "12 month-ending" "12 month-ending" ...
 $ Month_Ending_Date      : chr  "01/31/2019" "02/28/2019" "03/31/2019" "04/30/2019" ...
 $ Maternal_Deaths        : int  660 653 657 668 706 720 721 723 729 739 ...
 $ Live_Births            : int  3787776 3783489 3771682 3772235 3767999 3757275 3761912 3758847 3761856 3760121 ...
 $ Maternal_Mortality_Rate: num  17.4 17.3 17.4 17.7 18.7 19.2 19.2 19.2 19.4 19.7 ...
 $ Footnote               : chr  "" "" "" "" ...
Code
filtered_df<- df%>% filter(Maternal_Deaths > 0, Live_Births > 100) #Filtering out rare combinations

test_df <- filtered_df%>%
  select(-Maternal_Mortality_Rate)
  na.omit
function (object, ...) 
UseMethod("na.omit")
<bytecode: 0x2caab68>
<environment: namespace:stats>
Code
#Summary Statistics

deaths_df <-test_df %>%
  mutate(
    Ethnicity = case_when(
      Group == "Race and Hispanic origin" ~ Subgroup,
      TRUE ~ NA_character_
    ),
    Age_Group = case_when(
      Group == "Age" ~ Subgroup,
      TRUE ~ NA_character_
    ),
    Is_Total = Subgroup == "Total"
  ,
  Year = as.factor(Year),
  Month= as.factor(Month),
  Ethnicity=as.factor(Ethnicity),
  Age_Group=as.factor(Age_Group))#mutating groups

deaths_df_ethnic<-deaths_df %>%
  filter(!is.na(Ethnicity))%>%
  mutate(Ethnicity = Subgroup)###filter out is total if ethnicity or age group are na

deaths_df_age <-deaths_df %>%
  filter(!is.na(Age_Group)) %>%
  mutate(Age_Group = Subgroup)


merged_deaths_df <-  bind_rows(deaths_df_ethnic, deaths_df_age)  #merge both df

#setting factor levels
deaths_df3 <- merged_deaths_df %>%
  mutate(
    Ethnicity = factor(Ethnicity, levels = c(
      "Asian, Non-Hispanic", "Black, Non-Hispanic", "White, Non-Hispanic",
      "Hispanic", "American Indian or Alaska Native, Non-Hispanic",
      "Native Hawaiian or Other Pacific Islander, Non-Hispanic", "Unknown"
    )),
    Age_Group = factor(Age_Group, levels = c("Under 25 years", "25-39 years", "40 years and over", "Unknown"))
  )

#mutating any NAs with unknown
deaths_df3<- deaths_df3 %>%
   mutate(
    Ethnicity = replace_na(Ethnicity, "Unknown"),
    Age_Group = replace_na(Age_Group, "Unknown")
  )

# make variable in which we combine Year and Month into a for of a date to create later variable, or use month ending date?
deaths_df3$Date <- as.Date(paste(deaths_df3$Year, deaths_df3$Month, "01", sep = "-"))

# Dobbs was decided June 24, 2022
cutoff <- as.Date("2022-06-24")##when Roe v.Wade was overturned

deaths_df3$Dobbs_Era <- ifelse(deaths_df3$Date < cutoff, "Pre-Dobbs", "Post-Dobbs")
deaths_df3$Dobbs_Era <- factor(deaths_df3$Dobbs_Era, levels = c("Pre-Dobbs", "Post-Dobbs"))


df_age_year <- deaths_df %>%
  filter(!is.na(Age_Group), !is.na(Year)) %>%
  group_by(Year, Age_Group) %>%
  summarise(Maternal_Deaths = sum(Maternal_Deaths, na.rm = TRUE))
df_age_year
# A tibble: 18 × 3
# Groups:   Year [6]
   Year  Age_Group         Maternal_Deaths
   <fct> <fct>                       <int>
 1 2019  25-39 years                  6017
 2 2019  40 years and over            1139
 3 2019  Under 25 years               1320
 4 2020  25-39 years                  6824
 5 2020  40 years and over            1361
 6 2020  Under 25 years               1312
 7 2021  25-39 years                  8338
 8 2021  40 years and over            1975
 9 2021  Under 25 years               1558
10 2022  25-39 years                  9191
11 2022  40 years and over            2035
12 2022  Under 25 years               1796
13 2023  25-39 years                  6262
14 2023  40 years and over            1158
15 2023  Under 25 years               1134
16 2024  25-39 years                  5610
17 2024  40 years and over            1252
18 2024  Under 25 years               1386
Code
df_total_year <-deaths_df%>%
  filter(Group=="Total") %>%
  group_by(Year, Is_Total) %>%
  summarise(Maternal_Deaths = sum(Maternal_Deaths, na.rm = TRUE))
df_total_year
# A tibble: 6 × 3
# Groups:   Year [6]
  Year  Is_Total Maternal_Deaths
  <fct> <lgl>              <int>
1 2019  TRUE                8482
2 2020  TRUE                9503
3 2021  TRUE               11871
4 2022  TRUE               13022
5 2023  TRUE                8554
6 2024  TRUE                8248
Code
#deaths_cleaned <- deaths_df3 %>%
#  filter(Age_Group != "Unknown",Ethnicity != "Unknown") %>%
 # droplevels()

#deaths_df2<-deaths_df %>%
#  filter(!(Is_Total & (is.na(Ethnicity) | is.na(Age_Group))))###filter out is total if ethnicity or age when total is true

The below chart gives an idea of the missing values in the dataset. It seems like there are many NAs in ethnicity and age group, certainly more than is acceptable. The dataset indicates that Maternal Deaths between 1 and 9 are suppressed for privacy reasons. With more investigation, it appears that the racial/ethnic group “Native Hawaiian or Other Pacific Islander, Non-Hispanic” has NA for all but 2 Maternal Deaths values. We decided to omit this subgroup entirely. There are missing Maternal Mortality Rate entries in “American Indian or Alaska Native, Non-Hispanic”, which will not be an issue as we have decided not to use rate in our model ( though we will examine it during our Exploratory Data Analysis). Finally, there are six NA Maternal Deaths also in the “American Indian or Alaska Native, Non-Hispanic” category, a number that is reasonable to omit.

Code
plot_missing(deaths_df)

Code
plot_missing(test_df)

Since our data is longitudinal, it is easy to visualize with graphs. Below is a bar chart that shows the distibution of Maternal Deaths by racial/ethnic group. The counts indicated the sum of the Maternal Deaths for each group over the span of the data (2019-2024).

Code
#Data Exploration Graph 1
colors=c('yellow','orange', 'blue', 'red', 'green')
df_ethnicity <- deaths_df %>%
    filter(!is.na(Ethnicity)) %>%
    group_by(Ethnicity) %>%
    summarise(Maternal_Deaths = sum(Maternal_Deaths, na.rm = TRUE))
barplot(df_ethnicity$Maternal_Deaths,names.arg=c('Am In/Ala', 'Asian, NH','Black, NH','Hispanic','White,NH'),main="Maternal Mortality Totals 2019-2024 by Ethnicity",col=colors,xlab='Ethnicity',ylab='Maternal Deaths')

This graph shows the total Maternal Deaths for each year. The counts for year are the sum of the monthly data.

Code
#Data Exploration Graph 2
df_tot_year_ts<-ts(df_total_year$Maternal_Deaths, start = c(2019), frequency = 1)
plot.ts(df_tot_year_ts,main='Yearly Total Maternal Deaths 2019-2024',xlab='Year',ylab='Maternal Deaths',col='navy')

Code
df_tot_year_rate <- filtered_df%>%
  filter(Group=='Total')%>%
  select(Maternal_Mortality_Rate)
df_tot_year_rate_ts<-ts(df_tot_year_rate$Maternal_Mortality_Rate, start = c(2019,1), frequency = 12)
plot.ts(df_tot_year_rate_ts,main='Monthly Total Maternal Mortality Rate 2019-2024',xlab='Year',ylab='Maternal Deaths per 100,000 Live Births',col='navy')

Code
total_df <- deaths_df%>%
  filter(Group=='Total')
total_ts <- ts(total_df$Maternal_Deaths,start=c(2019,1),frequency=12)
plot.ts(total_ts,main='Monthly Total Maternal Deaths 2019-2024',ylab='Maternal Deaths',xlab='Time',col='navy')  

Code
His <- filtered_df%>%
  filter(Subgroup=='Hispanic')%>%
  select(Maternal_Deaths)%>%
  rename(Hispanic=Maternal_Deaths)
bl <- filtered_df%>%
  filter(Subgroup=='Black, Non-Hispanic')%>%
  select(Maternal_Deaths)%>%
  rename('Black_Non-Hispanic' = Maternal_Deaths)
wh <- filtered_df%>%
  filter(Subgroup=='White, Non-Hispanic')%>%
  select(Maternal_Deaths)%>%
  rename('White_Non-Hispanic'=Maternal_Deaths)
as <- filtered_df%>%
  filter(Subgroup=='Asian, Non-Hispanic')%>%
  select(Maternal_Deaths)%>%
  rename('Asian_Non-Hispanic'=Maternal_Deaths)
aina <- df%>%
  filter(Subgroup=='American Indian or Alaska Native, Non-Hispanic')%>%
  select(Maternal_Deaths)%>% 
  rename('American_Indian_or_Alaska_Native, Non-Hispanic'=Maternal_Deaths)
ethn_ts <- bind_cols(His,bl,wh,as,aina) 
ethn_ts <- ts(ethn_ts, start=c(2019,1), frequency = 12)
colors=c('red','blue','green','orange','yellow')
plot.ts(ethn_ts, plot.type='single',main='Monthly Maternal Deaths by Ethnicity 2019-2024', ylab='Maternal Deaths',xlab='Time',col=colors)
legend("topleft",                     
       legend = c("Hispanic", "Black, Non-Hispanic", "White, Non-Hispanic",'Asian, Non-Hispanic','American Indian, Alaska Native, Non-Hisp'),  # Labels
       col = colors,                   
       lty = 1,                        
       lwd = 2,                     
       bty = "n",
       cex = 0.8)                      

Code
HisR <- filtered_df%>%
  filter(Subgroup=='Hispanic')%>%
  select(Maternal_Mortality_Rate)%>%
  rename(Hispanic=Maternal_Mortality_Rate)
blR <- filtered_df%>%
  filter(Subgroup=='Black, Non-Hispanic')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('Black_Non-Hispanic' = Maternal_Mortality_Rate)
whR <- filtered_df%>%
  filter(Subgroup=='White, Non-Hispanic')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('White_Non-Hispanic'=Maternal_Mortality_Rate)
asR <- filtered_df%>%
  filter(Subgroup=='Asian, Non-Hispanic')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('Asian_Non-Hispanic'=Maternal_Mortality_Rate)
ainaR <- df%>%
  filter(Subgroup=='American Indian or Alaska Native, Non-Hispanic')%>%
  select(Maternal_Mortality_Rate)%>% 
  rename('American_Indian_or_Alaska_Native, Non-Hispanic'=Maternal_Mortality_Rate)
ethnR_ts <- bind_cols(HisR,blR,whR,asR,ainaR) 
ethnR_ts <- ts(ethnR_ts, start=c(2019,1), frequency = 12)
colors=c('red','blue','green','orange','yellow')
plot.ts(ethnR_ts, plot.type='single',main='Monthly Maternal Mortality Rate by Ethnicity 2019-2024', ylab='Maternal Deaths per 100,000 Live Births',xlab='Time',col=colors)
legend("topleft",                     
       legend = c("Hispanic", "Black, Non-Hispanic", "White, Non-Hispanic",'Asian, Non-Hispanic','American Indian, Alaska Native, Non-Hisp'),  # Labels
       col = colors,                   
       lty = 1,                        
       lwd = 2,                     
       bty = "n",
       cex = 0.7)            

Code
ageu25 <- deaths_df%>%
  filter(Subgroup=='Under 25 years')%>%
  select(Maternal_Deaths)%>%
  rename('Under_25_years'=Maternal_Deaths)
age25 <- filtered_df%>%
  filter(Subgroup=='25-39 years')%>%
  select(Maternal_Deaths)%>%
  rename('25-39_years' = Maternal_Deaths)
ageo40 <- filtered_df%>%
  filter(Subgroup=='40 years and over')%>%
  select(Maternal_Deaths)%>%
  rename('40_years_and_over'=Maternal_Deaths)
age_ts <- bind_cols(ageu25,age25,ageo40) 
age_ts <- ts(age_ts, start=c(2019,1), frequency = 12)
colors=c('red','blue','green')
plot.ts(age_ts, plot.type='single',main='Monthly Maternal Deaths by Age Group 2019-2024', ylab='Maternal Deaths',xlab='Time',col=colors)
legend("topleft",                     
       legend = c("Under 25 Years", "25-39 Years", "40 Years and over"),  # Labels
       col = colors,                   
       lty = 1,                        
       lwd = 2,                     
       bty = "n",
       cex = 0.8) 

Code
ageu25R <- filtered_df%>%
  filter(Subgroup=='Under 25 years')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('Under_25_years'=Maternal_Mortality_Rate)
age25R <- filtered_df%>%
  filter(Subgroup=='25-39 years')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('25-39_years' = Maternal_Mortality_Rate)
ageo40R <- filtered_df%>%
  filter(Subgroup=='40 years and over')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('40_years_and_over'=Maternal_Mortality_Rate)
ageR_ts <- bind_cols(ageu25R,age25R,ageo40R) 
ageR_ts <- ts(ageR_ts, start=c(2019,1), frequency = 12)
colors=c('red','blue','green')
plot.ts(ageR_ts, plot.type='single',main='Monthly Maternal Mortality Rate by Age Group 2019-2024', ylab='Maternal Deaths per 100,000 live Births',xlab='Time',col=colors)
legend("topleft",                     
       legend = c("Under 25 Years", "25-39 Years", "40 Years and over"),  # Labels
       col = colors,                   
       lty = 1,                        
       lwd = 2,                     
       bty = "n",
       cex = 0.8) 

Code
plot_histogram(deaths_df_ethnic$Maternal_Deaths,title='Histogram of Maternal Deaths - Ethnicity')

Code
plot_histogram(deaths_df_age$Maternal_Deaths, title='Histogram of Maternal Deaths - Age')

Code
plot_histogram(total_df$Maternal_Deaths,title='Histogram of Maternal Deaths - Total')

Code
#check overdispersion by for ethnicity data
deaths_df_ethnic %>% summarize(mean(Maternal_Deaths),
var(Maternal_Deaths))
  mean(Maternal_Deaths) var(Maternal_Deaths)
1              165.5779              18843.1
Code
#check overdispersion for age data
deaths_df_age %>% summarize(mean(Maternal_Deaths),var(Maternal_Deaths))
  mean(Maternal_Deaths) var(Maternal_Deaths)
1              276.2407              54102.1
Code
#check overdispersion for total data
total_df %>% summarize(mean(Maternal_Deaths),
var(Maternal_Deaths))
  mean(Maternal_Deaths) var(Maternal_Deaths)
1              828.8889             30819.62

References

Agresti, A. 2015. Foundations of Linear and Generalized Linear Models. Wiley Series in Probability and Statistics. Wiley. https://books.google.com/books?id=jlIqBgAAQBAJ.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Candy, Steven G. 2000. “The Application of Generalized Linear Mixed Models to Multi-Level Sampling for Insect Population Monitoring.” Environmental and Ecological Statistics 7 (3): 217–38.
Lee, Youngjo, and John A Nelder. 1996. “Hierarchical Generalized Linear Models.” Journal of the Royal Statistical Society Series B: Statistical Methodology 58 (4): 619–56.
McGillycuddy, Maeve, David I. Warton, Gordana Popovic, and Benjamin M. Bolker. 2025. “Parsimoniously Fitting Large Multivariate Random Effects in glmmTMB.” Journal of Statistical Software 112 (1): 1–19. https://doi.org/10.18637/jss.v112.i01.
Owili, Patrick Opiyo, Tang-Huang Lin, Miriam Adoyo Muga, and Wei-Hung Lien. 2020. “Impacts of Discriminated PM2. 5 on Global Under-Five and Maternal Mortality.” Scientific Reports 10 (1): 17654.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Salinas Ruı́z, Josafhat, Osval Antonio Montesinos López, Gabriela Hernández Ramı́rez, and Jose Crossa Hiriart. 2023. Generalized Linear Mixed Models with Applications in Agriculture and Biology. Springer Nature.
Tawiah, Kassim, Samuel Iddi, and Anani Lotsi. 2020. “On Zero-Inflated Hierarchical Poisson Models with Application to Maternal Mortality Data.” International Journal of Mathematics and Mathematical Sciences 2020 (1): 1407320.
Thall, Peter F. 1988. “Mixed Poisson Likelihood Regression Models for Longitudinal Interval Count Data.” Biometrics, 197–209.
Thompson, Jennifer A, Clemence Leyrat, Katherine L Fielding, and Richard J Hayes. 2022. “Cluster Randomised Trials with a Binary Outcome and a Small Number of Clusters: Comparison of Individual and Cluster Level Analysis Method.” BMC Medical Research Methodology 22 (1): 222.
Wang, Ke-Sheng, Xuefeng Liu, Muyiwa Ategbole, Xin Xie, Ying Liu, Chun Xu, Changchun Xie, and Zhanxin Sha. 2017. “Generalized Linear Mixed Model Analysis of Urban-Rural Differences in Social and Behavioral Factors for Colorectal Cancer Screening.” Asian Pacific Journal of Cancer Prevention: APJCP 18 (9): 2581.
Zuur, Alain F, Elena N Ieno, Neil J Walker, Anatoly A Saveliev, Graham M Smith, et al. 2009. Mixed Effects Models and Extensions in Ecology with r. Vol. 574. Springer.